Exploiting Non-Uniform Access Time in Interconnect Sensitive Cache Partitioning
نویسندگان
چکیده
Growing wire delay and clock rates limit the amount of cache accessible within a single cycle [3,13]. Cache architectures assume that each level in the cache hierarchy require a uniform access time. As microprocessor technology advance, architects must decide how to best utilize increased resources while accounting for growing wire delays and clock rates. Because on-chip communication is very costly [14], accessing different physical locations of the cache can return a range of hit time latencies. This lack of uniformity can be exploited to provide faster access to cache blocks physically closest to processing elements. More cache is being placed on the chip causing the access time of the closest cache bank to be much less than the access time of the farthest cache bank. Previous research leveraged such non-uniformity by migrating the most likely to be used cache sets into the closer cache banks. This research work focuses on the placement of the cache banks and the interconnection topology that allows each bank to communicate with one another and the processor core. 4 This research evaluates the performance gain of non-uniform cache architectures, interconnected in a hypercube network, through a detailed cache model, an Alpha 21364 floorplan model and an out-of-order processor simulator. The research methodology generates various cache organizations and timing information given a variety of cache requirements. The cache organization is then manually laid out on the physical floorplan where global wire lengths are manually extracted and modeled in HSpice to obtain the latency due to global wire delay. The generated hit/miss access times along with the global wire latency are simulated with Simplescalar [11] and the SPEC2000 benchmark suite [9]. Initial results compare an S-NUCA cache with a mesh network to a D-NUCA cache with a torus, mesh and hypercube interconnection topology and demonstrate a 43% performance improvement.
منابع مشابه
Design and Evaluation of a Switch
Cache coherent non-uniform memory access (CC-NUMA) multiprocessors provide a scal-able design for shared memory but they continue to suuer from large remote memory access latencies due to comparatively slow memory technology and data transfer latencies in the in-terconnection network. In this paper, we propose a novel hardware caching technique, called switch cache, to improve the remote memory...
متن کاملUsing Elimination and Delegation to Implement a Scalable NUMA-Friendly Stack
Emerging cache-coherent non-uniform memory access (ccNUMA) architectures provide cache coherence across hundreds of cores. These architectures change how applications perform: while local memory accesses can be fast, remote memory accesses suffer from high access times and increased interconnect contention. Because of these costs, performance of legacy code on NUMA systems is often worse than t...
متن کاملImproving Uniformity of Cache Access Pattern using Split Data Caches
In this paper we show that partitioning data cache into array and scalar caches can improve cache access pattern without having to remap data, while maintaining the constant access time of a direct-mapped cache and improving the performance of L-1 cache memories. By using 4 central moments (mean, standard-deviation, skewness and kurtosis) we report on the frequency of accesses to cache sets and...
متن کاملWay adaptable D-NUCA caches
Non-uniform cache architecture (NUCA) aims to limit the wire-delay problem typical of large on-chip last level caches: by partitioning a large cache into several banks, with the latency of each one depending on its physical location and by employing a scalable on-chip network to interconnect the banks with the cache controller, the average access latency can be reduced with respect to a traditi...
متن کاملMicroarchitectural Resource Management Issues on Multicore NUMA Systems
Modern computer systems make use of multiple cores and NUMA (Non Uniform Memory Access) architecture. Since multiple cores share various microarchitectural resources such as LLC (Last Level Cache), memory interface and interconnect, contentions on these resources become a serious performance bottleneck. In this paper, we explore research issues how to manage these resources to mitigate such con...
متن کامل